Além da Busca Básica: Superando as Limitações da Semelhança Semântica

Além da Semelhança

O "Problema dos 80%"ocorre quando a busca semântica básica funciona para consultas simples, mas falha em casos extremos. Quando buscamos apenas por similaridade, o armazenamento vetorial frequentemente retorna os trechos mais semelhantes numericamente. No entanto, se esses trechos forem quase idênticos, o modelo linguístico recebe informações redundantes, desperdiçando a janela de contexto limitada e perdendo perspectivas mais amplas.

Pilares Avançados de Recuperação

Relevância Máxima Marginal (MMR):Em vez de apenas selecionar os itens mais semelhantes, o MMR equilibra relevância com diversidade para evitar redundâncias. $MMR = \text{argmax}_{d \in R \setminus S} [\lambda \cdot \text{sim}(d, q) - (1 - \lambda) \cdot \max_{s \in S} \text{sim}(d, s)]$
Auto-Pesquisa:Utiliza o modelo linguístico para transformar linguagem natural em filtros estruturados de metadados (por exemplo, filtrar por "Aula 3" ou "Fonte: PDF").
Compressão Contextual:Reduz documentos recuperados para extrair apenas os trechos de "alta nutrição" relevantes para a consulta, economizando tokens.

A Armadilha da Redundância

Fornecer ao modelo linguístico três versões do mesmo parágrafo não o torna mais inteligente — apenas torna a solicitação mais cara. A diversidade é essencial para um contexto de "alta nutrição".

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Knowledge Check

You want your system to answer "What did the instructor say about probability in the third lecture?" specifically. Which tool allows the LLM to automatically apply a filter for { "source": "lecture3.pdf" }?

ConversationBufferMemory

Self-Querying Retriever

Contextual Compression

MapReduce Chain

Challenge: The Token Limit Dilemma

Apply advanced retrieval strategies to solve a real-world constraint.

You are building a RAG system for a legal firm. The documents retrieved are 50 pages long, but only 2 sentences per page are actually relevant to the user's specific query. The standard "Stuff" chain is throwing an OutOfTokens error because the context window is overflowing with irrelevant text.

Step 1

Identify the core problem and select the appropriate advanced retrieval tool to solve it without losing specific nuances.

Problem: The context window limit is being exceeded by "low-nutrient" text surrounding the relevant facts.

Tool Selection:ContextualCompressionRetriever

Step 2

What specific component must you use in conjunction with this retriever to "squeeze" the documents?

Solution: Use an LLMChainExtractor as the base for your compressor. This will process the retrieved documents and extract only the snippets relevant to the query, passing a much smaller, highly concentrated context to the final prompt.